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STATISTICAL ANALYSIS ON HIGH-DIMENSIONAL SPHERES 
AND SHAPE SPACES 1 

By Ian L. Dryden 

University of Nottingham 

We consider the statistical analysis of data on high-dimensional 
spheres and shape spaces. The work is of particular relevance to ap- 
plications where high-dimensional data are available — a commonly 
encountered situation in many disciplines. First the uniform measure 
on the infinite-dimensional sphere is reviewed, together with connec- 
tions with Wiener measure. We then discuss densities of Gaussian 
measures with respect to Wiener measure. Some nonuniform distri- 
butions on infinite-dimensional spheres and shape spaces are intro- 
duced, and special cases which have important practical consequences 
are considered. We focus on the high-dimensional real and complex 
Bingham, uniform, von Mises-Fisher, Fisher-Bingham and the real 
and complex Watson distributions. Asymptotic distributions in the 
cases where dimension and sample size are large are discussed. Ap- 
proximations for practical maximum likelihood based inference are 
considered, and in particular we discuss an application to brain shape 
modeling. 

1. Introduction. Applications where high-dimensional data are available 
are routinely encountered in a wide variety of disciplines. Hence the study 
of suitable probability distributions and inferential methods for analyzing 
such data is very important. A practical application that we shall consider 
is cortical surface modeling from magnetic resonance (MR) images of the 
brain. 

Consider the situation where we have a high-dimensional observation x p 
on the unit sphere in p real dimensions = {x p : ||x p || = 1}. We wish to 

consider modeling x p as p — > oo, and the observation tends to a function of 
some kind (a generalized function), which is represented by a point on the 
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infinite-dimensional sphere We investigate appropriate probability 

distributions and statistical inference for this situation. 

The unit norm constraint often arises naturally in high-dimensional data 
analysis; for example, if Z ~ N p (0, I p /p), where I p is the p x p identity ma- 
trix, then \\Z\\ = 1 + O p (p~ 1 ^ 2 ) and hence as p — > oo we regard Z as a point 
on S°°(l) almost surely. 

The unit norm constraint is also commonly used in shape analysis, where 
one requires invariance under scale changes, as well as location and rotation. 
Also, the constraint arises in the analysis of curves. For example, a dataset 
may have been recorded at arbitrary scales, and it is the general shapes of 
the curves that are of interest. A common approach to dealing with this 
problem is to rescale each curve to have unit norm. The models we consider 
are for generalized functions but they may also be of relevance to functional 
data analysis (FDA) (e.g., see [24]). However, in FDA various additional 
continuity and smoothness assumptions are usually made. 

Statistical analysis on the infinite-dimensional sphere is not straightfor- 
ward. For example, surface area {S' p_1 (l)} — > as p — > oo even though the 
radius is fixed at 1. In order to define a uniform measure and other distribu- 
tions on the infinite-dimensional sphere one can use a relation with Wiener 
measure. 

In Section 2 we review the Wiener measure and its connection with the 
infinite-dimensional sphere. Work on densities of Gaussian measures with re- 
spect to Wiener measure is also discussed. In Section 3 we define a nonuni- 
form measure on the infinite-dimensional sphere. We show that particu- 
lar high-dimensional Bingham and high-dimensional zero-mean multivari- 
ate normal distributions have this distribution in the limit as the dimension 
p — > oo. In Section 3.3 we describe maximum likelihood based inference, and 
in particular we discuss practical implementations. Asymptotic distributions 
in the cases where dimension and sample size are large are also discussed. In 
Section 4 we make connections with existing results and provide extensions 
for the high-dimensional uniform, von Mises-Fisher and Watson distribu- 
tions, and we discuss the Fisher-Bingham distribution. We also investigate 
the high-dimensional complex Bingham and complex Watson distributions, 
which have important applications in shape analysis. In Section 5 we discuss 
an application to cortical surface analysis from medical images of the brain, 
and finally we conclude with a brief discussion. 

2. Wiener measure and Gaussian measures. 

2.1. Wiener measure and the infinite- dimensional sphere. Let C = {w € 
C[0, 1] : u>(0) = 0} be the set of continuous paths on [0, 1] starting at 0. When 
considering an observation x p = (x p (l), . . . , x p (p)) T on a high-dimensional 
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sphere S p 1 ( 1 ) it will also be useful to construct the following path defined 
on C: 

k 

(1) Q p {x p ,k/p) =J2x p (i), 

i=l 

where Q p (x p ,0) =0 and Q p (x p ,t) is linearly interpolated between (k — 
l)/p <t < k/p, k = 1, . . . ,p. If x p is uniformly distributed on 5 P_1 (1), then 
Q p tends to the Wiener process (i.e., Brownian motion) on C as p— ► oo [8]. 
Hence, the uniform measure on S°°(l) is related to the Wiener measure on 
C. Despite a relatively recent rigorous proof, the connection between Wiener 
measure and the uniform measure on S°°(l) has a long history starting with 
Poincare [23] and Wiener [30]. 

The formal sense in which the Wiener measure is related to the uniform 
distribution on S°°(l) is now described. The Wiener process is written as 
W = {W(t) :t E [0, 1]}. The Wiener measure on C is the probability measure 
given by 

„ w ({W:W {t ) - W(s) E D}) = (2T(t ^ ))1/2 / D ^(^)) *» 

for s < t and a Borel set DCR, and the disjoint increments Wit) — W(s) of 
paths in C are independent. Let fis,p be the uniform probability measure on 
the finite-dimensional sphere Then consider the probability mea- 

sure fiw,p on C of a Borel set D, 

VW, P ( D ) = VS, P {{ X P '■ Qp( x P> •) G £>})■ 
Theorem 2.1 ([8]). nw,p ~ > weakly asp^oo. 

Hence, we can think of the uniform distribution on ^(l) as inducing 
the Wiener measure on C. If X = {X(t) :t E [0, 1]} is uniformly distributed 
on S°°(l), then the induced path Y (the indefinite integral of X) on C is 
the Wiener process, and we write Y ~ W. Note that X is not a standard 
stochastic process [since Wit) is nowhere differentiable] , but rather X is a 
generalized function or generalized random field [11], which is also known as 
a Schwarz distribution. The generalized random field X in the uniform case 
here is known as white noise [13] and we write X ~ W to mean X is white 
noise. Note that the induced path on C given by the indefinite integral of 
white noise is defined, even though pointwise values of X(t) are not. Hence, 
the induced path on C is a standard stochastic process and it is often more 
straightforward to work in the induced space of the continuous paths. Note 
that in our work it is the white noise that satisfies the unit norm constraint, 
not the induced path process. We shall reserve the notation X(t) and U(t) 
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for generalized functions on S°°(l), and Y(t) and W(t) for the induced path 
processes on C. 

We can also regard white noise as a limit of a standard multivariate 
normal distribution as the dimension increases. From the definition of the 
Wiener process, if z p ~ N p (0, I p /p), then the path Q p (z p ,-) — > W (Wiener 

process) as p — > oo, where means convergence in distribution (i.e., weak 

convergence) . We shall also write z p VV (white noise) as p — > oo in this 
case. 

In Section 1 we noted that ||z p || = 1 + O p (p~ 1 ^ 2 ), and so z p is approxi- 
mately on S p (1) for large p. This observation can be seen using ^ 
Xp/p, where x p is a chi-squared random variable with p degrees of freedom. 
Therefore £[||^|| 2 ] = 1, var(||z p || 2 ) = 2/p and so \\z p \\ 2 = 1 + O p {p~ 1 / 2 ) and 
||^|| = l + Op(p-V2). 

2.2. Gaussian measures. Shepp [25] discussed absolute continuity and 
probability density functions of Gaussian measures with respect to Wiener 
measure. Consider the Gaussian measure ii m ,R on C with mean 



m(t) = Y(t)dn m>R {Y) 

J — oo 

and covariance function 

/oo 
(Y( s ) - m(s))(Y(t) - m(t)) d^ R (Y). 
-oo 

Let L 2 ([0, 1]) be the space of Lebesgue square integrable functions on [0, 1] 
and let C 2 be the space of Lebesgue square integrable functions on [0,1] x 
[0,1]. 

Theorem 2.2 ([25]). The Gaussian measure [i m .R * s absolutely contin- 
uous with respect to Wiener measure if and only if: 

(i) there exists a kernel K £ C 2 for which 



R(s,t) = min(s,t) — / / K(u, 
Jo Jo 



v) dudv, 



(ii) the eigenvalues aj of K all satisfy a,j < 1, 
(hi) there exists a function n £ L 2 ([0, 1]) for which 

m{t) = / r](u) du. 



o 



The kernel K is unique and symmetric and is given by —d 2 R/dsdt for 
almost every (s,t). The function r\ is unique and is given by rj(t) = dm(t)/dt 
for almost every t. 
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Denote the complete orthonormal eigenfunctions of K as 71 , 72 , . . . , 700 



corresponding to eigenvalues a±, 02, . . . , a^. Since K £ £? we have X)?^=i a ? < 00 • 



Let Me£ 2 have the same eigenfunctions as if and corresponding eigenval- 
ues 1 — (1 — a,) 1 / 2 , where ou < 1. Define 

/(«) = f M(s,u)dW(u) 

and 







y(t) = W(i)- [ I(s)ds + rn(t), 
Jo 

where Wit) is the Wiener process on C. Note 

=m(f), cov(y(s),y(t)) = min(s,t) - /"* f K(u,v)dudv. 

Jo Jo 

Theorem 2.3 ([25]). Let \x m ,R be absolutely continuous with respect to 
Wiener measure. The probability density function of Y = {Y(t) :t £ [0, 1]} 
with respect to Wiener measure is 

R (Y) = f G (Y;m,R) 



duw 
(2) 



n{(i- % )-^exp{-ffi^! + i^ 



where Yj = Jq 1 7j(i) dY(t) is the Wiener integral evaluated at Y, and r]j = 

PROOF. This follows directly from [25], equation (4.8). Since Yl'jLi < 
00 this product converges, and since all aj < 1 the product is nonzero. □ 



Note that (2) is also known as the Radon-Nikodym derivative or likeli- 
hood ratio. 



2.3. Sequences of matrices. Consider the positive-definite self-adjoint lin- 
ear operator £ with eigenvalues Ai > A2 > • • • > Aqo > 0, and orthonormal 
eigenfunctions 71 , 72 , • • • , 7oo which form a complete orthonormal basis in 
L 2 ([0, 1]). From the spectral decomposition theorem 

00 

S = 5Z A i^' ><7i. 
j'=i 
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where >< is the outer product. We shall define a particular sequence of 
matrices which converges to the self-adjoint linear operator S, and this se- 
quence imposes some extra structure on S. Consider the p x p symmetric 
matrices with full rank: 

i=i 

where A^ > A^ > • • ■ > X^ > are the eigenvalues of £ p , with corre- 

(p) 

sponding eigenvectors given by Jj ,j = l,...,p. We shall consider sequences 
of symmetric positive definite matrices £ p , p= l,2,...,oo, which have the 
properties 

(3) X < f ) ^X J >0, if^lj asp-oo,j = l,...,p, 



(4) J2xf=p + 0(1), 

3=1 

(5) j2(xff=p + 0{l). 

i=i 

From (3) £ p — > T, as p — > oo, where X is a positive-definite self-adjoint linear 
operator. We write 

(6) lim (/„ — £„) = K, aj = 1 — A,-, 

p-^oo 

where if is a self-adjoint linear operator and a 3 - < 1. From (3) and (4) we 

have J2T= i °i = From ( 5 ) ££=1 a ) < °°> and hence K e £ 2 - 
We also consider a reparameterization 

(7) s^^-s; 1 ), 

where B p has eigenvalues /jj 3 ^ = ^(1 — l/Xj), j = 1, . . . ,p. 

Example. An example of a sequence that satisfies (4) and (5) is where 
the eigenvalues of £ p are 

(8) X^>X^>...>X^>X^ 1 = ... = X^ = 1, 

and Xj ' = 0(1), j = 1, . . . , h, that is, the smallest p — h eigenvalues of £ p 
are equal to 1, where 1 < ft, < oo is fixed. 



3. Nonuniform distributions and the Bingham distribution. 
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3.1. Nonuniform distributions on ^(l). In order to consider modeling 
on S°°(l) we need to define useful nonuniform distributions. Let us con- 

1/2 

sider the generalized function X = linip^oo T, p u p , where u p is uniformly dis- 
tributed on SP-^l), Sp /2 = ££L i(Af ) ) 1/2 lf ) >< with eigenvalues and 
eigenvectors constructed as in Section 2.3. The noise X induces a nonuniform 

1/2 

distribution for the limiting path Y = lim J ,_ i . tX3 u p ,-) E C in general 

with respect to Wiener measure on C, and we write Wo,£ for this process 
on C. The noise X itself is not white noise in general on S°°(l). We write 
X ~ VVo,s for this generalized function and we note that W = VVo,j, where 
/ is the identity linear operator. 

1/2 

Proposition 3.1. If X = linip^oo S p ' u p ~ Wo,e, ^en i/ie induced mea- 
sure of a Borel set D £C is fj,Q^(D), the zero-mean version of the Gaussian 
measure defined in Section 2.2. The probability density function of the in- 

1/2 

duced process Y = linip^oo Q P (T, P u p , ■) 6 C with respect to Wiener measure 
is 

(9) = f G (Y; 0,S) = n«l - a,)- 1 / 2 e - a ^ 2/{2(1 -^ )} }, 
dfj, w £± 

where Yj = Jq jj(t) dY(t) is the Wiener integral evaluated at Y. 

Proof. If u p is uniform on S p ~ 1 (l), then we know that the path Q p (u p , •) 

1/2 

as p — ► oo. Hence, y p = <5 P (S P tt p , •) — ► F G C, where Y is the Gaussian pro- 
cess given by 

(10) Y(t) = W(t) - f f 1 M(s, u) ds dW(u), 

Jo Jo 

so 

(11) E[Y(t)]=0, cav(Y(s),Y(t)) = min(s,t) - /" f K(u,v)dudv, 

Jo Jo 

and the relation between S p and K is given by (6). Hence, the induced 
measure on C is /xo,s and the density follows from Theorem 2.3. □ 

The noise Wo,s can also be regarded as a limit of zero-mean multivariate 
normal distributions, as shown in the next results. 

Proposition 3.2. Under assumptions (4) and (5), ifv p ~ N p (0,E p /p), 
then \\v p \\ = l + O p {p~ l l 2 ). 
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Proof. This result follows from the properties of the multivariate nor- 
mal distribution and because trace(£ p ) =p + 0(l) and trace(Sp) = p + 0{\). 
Hence, 

E[\\v p f}=p- 1 tT&ce(X p ) = l + 0(p- 1 ), 
var(||v p || 2 ) = 2p- 2 trace(£ 2 ) = 0{p~ l ). 

Therefore, \\v p \\ 2 = 1 + O p (p' 1/2 ) and hence \\v p \\ = 1 + O p (p~ 1/2 ). □ 

So, for finite p, the point v p does not lie on 5 P_1 (1) but will be close for 
large p. 

Proposition 3.3. Under assumptions (3)-(5), ifv p ~ N p (0, T> p /p), then 
v p VVo,e, as p^oo. 

PROOF. Note z p = Hp^ 2 v p ~ N p (0,I p /p) W as p —> oo. Hence the 

1 /2 

path y p = Q p (E p ' z p , •) — ► F G C, where Y is the Gaussian process given by 

(10) and (11). Hence, y p —> Wo,s and so v p -5- VVo,s as p ^ oo, as required. 
□ 

3.2. The Bingham distribution. Let us define the Bingham (pB p ) family 
of distributions on S p ~ 1 (l) to have probability measure 

d^B, P ,T, = CEsipBp)' 1 exp(pXpB p x p ) dfi s , P , 

where x p £ S p ~ x (l), B p is given in (7) and 

(12) c B (pB p ) = 1 F 1 (±^,pB p ^ 

is the confluent hyper geometric function with matrix argument (e.g., see 
[21], page 181). The addition of an arbitrary constant to the eigenvalues 
of pB p does not change the particular Bingham distribution. So to ensure 
identifiability we fix the minimum eigenvalue of B p at 0, which is equivalent 

to fixing the minimum eigenvalue of S p to be 1, that is, X p p ^ = 1. From (7) 
d^B, P ,T, = c B {pB p )~ l e p/2 exp(--x p E p 1 x p ) dfi s , p 

(13) 

— fp\ x pi ^p) d/j,s, P , 

say. The Bingham distribution is often used for modeling axial data in direc- 
tional data analysis, where the directions x p and indistinguishable 
(see [21], page 180). If > ^2 > then the mode of the distribution is 71. 
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We regard jj as the (j — l)st principal component (PC) (J > 2). The Bing- 
ham (pB p ) distribution is the iV p (0, S p /p) distribution conditioned to have 
unit norm. 

Chikuse ([6] and [7], Chapter 8) has considered high-dimensional asymp- 
totic results for the Bingham distribution, the matrix Bingham and other 
nonuniform distributions on spheres and Stiefel and Grassman manifolds. 
We discuss one of her results in particular for the finite-dimensional projec- 
tion of the high-dimensional Bingham distribution. Let Ph = [ei, . . . , e^] be 
a p x h (p> h) matrix of orthonormal columns with properties P^Ph — Ih 
and PhPfrXp = x v , where x v is the projection of x p into the /i-dimensional 
subspace generated by the columns of Ph- 

Theorem 3.4 ([6]). If x p has a Bingham distribution with parameter 
matrix pB p and T, p = (I p — 2B p ) _1 is positive definite, then 

p x l 2 PlY>- x l 2 P h Plxp = p x l 2 Pl(Ip - 2Bp) x l 2 P h Plxp 3 N h (0,I h ) 
as p — > oo . 

Proof. Chikuse ([6], Theorem 4.5) used an asymptotic expansion of the 
joint distribution of the components for the matrix Bingham distribution on 
the Stiefel manifold V p ±. Since V P) i = S' p_1 (l), the k = 1 case is of interest. 
In particular, 

lkn 1 F 1 Q, V -,pPlB p P^\ = |4 - 2PlB p P h \' x l 2 
leads to the required result. □ 
Note that 

px T v P h Pl Y,~ x P h Pl Xp ^ xt, 

as p — > oo. Chikuse [7] also provides higher-order terms in the approximation 
of Theorem 3.4, and many other finite projection results. We wish to examine 
the distribution of x p in the continuous limit as p — > oo . 

Proposition 3.5. Define Q p (x p ,-) as in (1). Consider the Bingham 
probability measure [i>w,p,Y> on C of a Borel set D given by Hw,p,Y,{D) = 
fJ-B,p,T:({xp : Qp(x p , ■) G D}), where PB,p,Y> is defined in (13) and the sequence 
S p satisfies (3)-(5) with S p — ► S. Then Hw,p,T: — > Mo,s weakly as p — > oo. 

Proof. Let g:C -^M be a bounded continuous function. Define 

E p[a] = / g(Q P (xp,-))dnw,p,j: 
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g(Qp(x p , -))dnB,p,T, 
g(Q P (x p , -))fp(xp, S p ) dns, P 



g(Q p (x p , -))fp(x p , Ep) cfyttf/p 

-> J c g(Y)fG(Y;0,^)dfi w asp^oo, 

because F = linip^oo Qp(xp, •), linip^oo pXpB p x p = YljLi -a jYf /{2(1 - aj)}, 
where lj = Jq 1 7j(i) c?Y"(i) and fJ,w,p —> weakly as p — ► oo. Note that 
fa{Y; 0, £) is given by (9) and 0, S) = ^o,s- So 

#pb] / 9(Y)dfi ,s. 
Jc 

Hence, we have shown weak convergence //w,p,£ ~~ ► W),£ as p — > oo. □ 

We can consider x p ~ Bingham(pi?p) — ► Wo,s as p —> oo. From the above 
results a practical approximation is that, for large p and under assumptions 
(3)-(5), 

(14) Bingham(p5p) « ^(O,^ 1 ^ - 2-Bp) x ) = N p (0, S p /p). 

Since there is a constraint ||:Ep|| = 1 under the Bingham distribution, the 
approximation will be best when a singular multivariate normal distribution 
is used with p—1 dimensions of variability; see Section 4.6 for a comparison 
in an example. 

3.3. Inference. Let x p i G S p ~ 1 (l), i = 1, . . . ,n, denote a random sample 
from the Bingham distribution of (13). The log-likelihood is 

n 

l(Xpi, . . . , Xp n \Yip/p) = log fp(Xpi, Tip) 



i=l 



n log c B (pB p ) +pY^ XpiBpX 



p-^pn 



i=i 



where cb is defined in (12). The maximum likelihood estimators (m.l.e.'s) of 
the eigenvectors of B v are given by the eigenvectors of ^ Ya=i x pi x pi, Du t the 
m.l.e.'s of the eigenvalues must be obtained using numerical optimization, 
working with the difficult normalizing constant cb(pB p ). Kume and Wood 
[18] provide a saddlepoint approximation. 

For large p, from (14) we can use the normal approximation x p i « -/Vp(0, (I — 
2B p )~ 1 jp) = Np(0, Tip/p). Hence, the m.l.e. of S p is approximately S p = 
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£ Ya=i x pi x pi> which has (exact Bingham m.l.e.) eigenvectors 71, . . ■ ,7 P cor- 
responding to (approximate Bingham m.l.e.) eigenvalues Ai > A2 > • • • > 
A p > 0, and we write 

Ld j = \ j /p, j = l,...,p. 

The m.l.e. for the mode of the distribution is 71 (when the largest eigenvalue 
of E p is unique). We can regard an estimate of the concentration about the 
mode to be u)±, and if &\ ~ 1 the data are highly concentrated. The sample 
eigenvector jj is the (j — l)st sample principal component with estimated 
variance Uj, j = 2, . . . ,p. 

Another option for practical analysis is to consider the special case with 
eigenvalues (8). Choose h <n and fix the projection matrix P^ in advance 
(e.g., using h Fourier or spline basis functions). Then, as p — > 00 (fixed h), 

from Theorem 3.4, v p i = p 1 ^ 2 P/[ x p i ^* Nh(0, E/i), i = 1, . . . ,re, where E^, = 
P^YtPh. The m.l.e. of E/j is E^ = ^ Y^l=i v pi v pi an d the distribution of E^ is 
a Wishart distribution (e.g., [22], page 85). Expressions for the joint density 
of the sample eigenvalues can be written down using the two-matrix qFq 
hypergeometric function (from [14]) and a large sample approximation is 
given by G. A. Anderson [1] — see [22], pages 388, 392. The joint distribution 
of sample eigenvalues and eigenvectors of covariance matrices of Gaussian 
data is known for all n,p but difficult to work with (e.g., see [22]). Hence, 
we consider useful approximations for large re, p. 

The asymptotic joint distribution of the eigenvalues and eigenvectors of 
E/,, for large n is given by the classical result of T. W. Anderson [2], and 
we require p/n 2 — ► 00 and n — ► 00 for this result to hold (with h fixed). 
The details are as follows. Assume for now that the eigenvalues of E^ are 
distinct A^ > A2 > • • • > A^ > with corresponding eigenvectors 7^ , 
j = 1, . . . , h. From [2] as n — ► 00, p/n 2 — ► 00 we have 

(15) re^Af - Xf )) Z N(0,2(xf ) ) 2 ), j = l,...,h, 
independently, and 

(16) nV^-^i^O,^), 
where 

v j~ A j 2s ,.(h) x {h),^k v7fc J > j — 1, . . . , n, 



^2 



and ,Aj are all asymptotically independent. Similar results follow when 
there are some multiplicities of eigenvalues, using [2] again. 

Asymptotic distributions for dimension p fixed and n — > 00 are summa- 
rized by Mardia and Jupp ([21], page 187) and Watson [28]. If we now let 
p — ► 00 and n/p — > 00, then we have a consistency result. 
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Proposition 3.6. Consider the Bingham (H p /p) distribution on S' p ~ 1 (l) 
with S p = (Ip — 2B p )~ 1 and m.l.e. E p . As p — > oo,n— > oo and np~ l — > oo, 
then S p -n- S p — > S . 

Proof. Since S p = S p + Op^ 1 / 2 ?!" 1 / 2 ) and as n/p — > oo, we have £ p 
Ep — * E as p — > oo . □ 

Other results for p fixed and n — > oo are worth investigating for p — > oo 
and n/p — > oo, for example, the central limit results of Watson [28], Fisher, 
Hall, Jing and Wood [10] and Bhattacharya and Patrangenaru [5]. 

4. Other distributions. We now consider results for other high-dimensional 
distributions which are useful in directional data analysis and shape analy- 
sis. Table 1 provides a summary of the notation used in the paper for the 
different measures, and the limiting path processes and noises. 

4.1. Uniform distribution. Let Ph be a p x h matrix so that P^x p is 
the h- vector of the first h components of x p . Stain [26] showed that if x p is 
uniformly distributed on 5' p_1 (l), then 

p l ' 2 P^x p ^N h {^I h ) asp^oo. 

The result also holds for any p x h matrix Ph of h orthonormal columns. 
Theorem 2.1 provides the extension to the infinite-dimensional case and we 

have x p W as p — > oo. 



Table 1 

Notation used in the paper for the different measures, the limiting path processes and 

limiting noise 







Measures 








Distribution 


(a) 


(b) 


(c) 


Limiting path process 


Limiting noise 


Uniform 








W 


W 


Bingham 


AtS,p,E 


Hw,p,s 


MO.E 


Wo,E 


W ,E 


von Mises-Fishcr 


/^V,p,I>,K 










Fisher-Bingham 








W f ,E 




Complex uniform 




c 


c 


vv c 


vv c 


Complex Bingham 




MvK,p,E 


Mo.E 


Wo C ,E 


W C ,E 



Column (a) denotes measures for Borel sets on S' p ^ 1 (l) or CS P-1 (1) in the complex case, 
column (b) denotes measures for Borel sets on C using the continuous piecewise linear 
approximation Q v of (1) and column (c) denotes the limiting Gaussian measure on C as 
p — » oo. The final columns show the limiting path Gaussian process on C and the limiting 
noise. 
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4.2. von Mises-Fisher distribution. Watson [27, 29] considered the fixed 
rank case for the von Mises-Fisher distribution (which Watson called the 
Langevin distribution). Let x p have a von Mises-Fisher distribution with 
parameters given by the mode v p 6 S' p ~ 1 (l) and concentration p l / 2 K. The 
density with respect to uniform measure on S P ~ 1 {1) is 



where 



/ 1/2 Nl-p/2 

cv(p 1/2 k) = (^) Tb/2)I pftr . x tf/*K), 

with the modified Bessel function of the first kind and order j £ R + 
(e.g., see [21], page 168) and where T(-) is the gamma function. Note that this 
von Mises-Fisher distribution can be regarded as the multivariate normal 
distribution Np^KVp/p 1 / 2 , I p /p) conditioned to have unit norm. 
Watson [27] showed that for this von Mises-Fisher distribution 

p 1/2 P£x p % N h (Pj[vpK, I h ) as p ^ oo, 

for any px h matrix Ph of h orthonormal columns spanning a subspace con- 
taining z/p. We write z p = x p — KVpjp 1 ! 2 and lim^oo Kv p /p 1 / 2 = n E L 2 ([0, 1]). 
Since 

fv,p{z P ,Vp,p l,2 n) -> fc(y ' = plim^ Qp(z p ,-);0,ij as p -> oo , 

and using a similar argument to that in the proof of Proposition 3.5, it follows 
that z p -3- >V. Equivalently, consider the probability measure fJ,w,p,v,n on C 
of a Borel set D: 

^•W,p,U,K 

then Hw,p,v,k weakly as p— > oo, where £(t) = Jq rj(s)ds. From [8] the 

probability density function of the shifted measure is given by the Cameron- 
Martin (Girsanov) formula 

i 1 ri 



^(Y)=exp{/ v (t)dW(t)-\ f\(t) 2 dt\ 
dfjiw Uo 2 Jo J 



which can also be seen using Shepp's [25] result of Theorem 2.3 in this special 
case where a,j = for all j = 1, . . . , oo. 

The practical implication is that we can choose fixed h<n, use Ph as any 
suitable choice of h basis functions, and then carry out inference using v p = 
p 1 ' 2 PfoXp/h 1 ~/ 2 ~ N^KP^Vp/h 1 / 2 ,Ih/h). In particular, if v p i, . . . ,v pn are a 
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random sample from this multivariate normal distribution, then the m.l.e.'s 



arc 



(17) 



K ■ 



i=l 



i=l 



i=l 



i=l 



Also, k 2 ~ -xKnK 2 ) (which was given by Watson [29]) and P^fp has an 
offset Gaussian distribution ([21], page 178). Also, from [29] if we write 

cosp = (pT v p ) T P^Vp, then 



nk 2 p 2 ~xl-i asp,n->oo, 



where p/n 2 — > oo. 



4.3. Watson distribution. Again let P^ x p select the first h points from 
the p- vector x p , where x p S 5^ (1). Let x p have a distribution with density 
with respect to the uniform measure on S ,p ~ 1 (l) given by 

C\v(P K ) e ^P(P K \\ p h x P \\ 2 ), 

where cw(j>k) = 1-^1(2 > 2>P K ) 1S nere ^ ne confluent hypergeometric function 
with scalar argument (see [21], page 181). Watson [27] showed that, for fixed 
h, under this distribution 

(1 - 2K) l ' 2 p 1 ' 2 P^Xp Z N h (0, I h ) as p 00, 

when k < 1/2. (Note that it seems clear that there is a typographical error 
in (47) of [27], where the square root of (1 — 2k) was not taken.) 

The Watson distribution is a special case of the Bingham distribution, 
and a suitable choice of matrix sequence that satisfies (3)-(5) is S" 1 = 
I p — 2nPf l P^ , which is positive definite if k < 1/2 and P^ is any px h matrix 
of orthonormal columns (note B p = kP^PT). From Theorem 3.4, for this 
particular Bingham distribution 

p l/2 P^x p Z Nh (0, (1 - 2k)- 1 4) as p - oo, 

if k < 1/2. Hence, Watson's result is confirmed as a special case of Theo- 
rem 3.4. 

The case where h = 1 is commonly encountered in directional data anal- 
ysis with parameters k, Pi, with modes at ±P± for k > 0, and isotropically 
distributed about these modes. 
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4.4. Fisher-Bingham distribution. Similar high-dimensional results fol- 
low for the Fisher-Bingham distribution ([19], [21], page 174). The parame- 
ters of the distribution are the mode u p , a concentration parameter and a ma- 
trix (with constraints) specifying the structure of variability about the mode. 
Consider the parameterization where the Fisher-Bingham {y p ^/ 2 K,pB^) 
distribution has density with respect to the uniform measure on S p {1) 
given by 

d^F,p,u,K,T, , n / 1/2 D \— 1 / 1/2 t , T D \ 

— -f- (Xp) = c F (u p ,p 1 K,pBp) exp(p i kx Vp+px BpXp), 

where u p is one of the first h eigenvectors of B p , and we shall consider 
S p = (Jp — 2£>p) _1 to be positive definite. The integrating constant 

can be expressed in terms of the density of a linear combination of noncen- 
tral Xi random variables [18], which can be evaluated using a saddlepoint 
approximation. The Fisher-Bingham (y pi p 1 / 2 K,pB p ) distribution can be re- 
garded as N^Hpi/p/p 1 / 2 , Sp/p) conditioned to have norm 1. 

Proposition 4.1. If x p has a Fisher-Bingham {v pi p 1 / 2 K,pB p ) distri- 
bution on S P ~ 1 {1), with Vp one of the first h eigenvectors of B p and positive 
definite T, p = (I p — 2B p )~ l , then 

(18) p^P^-^PhPTxp^Nhi^h) asp^oo, 

where P^ is the p x h matrix with columns given by the first h eigenvectors 
of Bp and <j> = limp-^ kP% T, p P h P^ v p . 

PROOF. Let x p = tx v + (1 - t 2 ) l / 2 x^, where OC u IS 3j unit vector in the 
subspace V of MP spanned by the first h eigenvectors of S p , unit 
vector in the orthogonal complement of V and t = \\xh\\ is the norm of 
Xh = tx v = PhPf[xp, which is the part of x p in V . An invariant measure on 
S p (1) may be written as 

see [27]. So, the Fisher-Bingham measure with parameters v p ,p l / 2 n, pB p = 
p(I p — Sp 1 )/2 in terms of (t,x v ,x^) is proportional to 

/ 1/2, t t2 P T V -1 , *M 
exp i Kp 1 tx v Up - — x v Sp x v + — \ 

x (1-1(1 - t 2 )^' 2 - 1 dt»s, h {dx v )ns, P - h {dxi). 
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Note is independently uniformly distributed. Writing u = p l / 2 t and in- 
tegrating out we have the joint density of (u,x v ) as 

f{u,x v ) oc u fc-1 (l - t(Vp)^ _ft)/2_1 expf/ttu^Vp - yxJSp -1 ^ + yY 

Let y = p l / 2 P^Tj p 1 ^ 2 P^P^Xp be the /i- vector such that y T y = u 2 x^T 1 ~ l x v . 
Hence transforming from (it, to y and with Jacobian proportional to 
u l ~ h , and noting that (1 — ii 2 /p)( p-/l )/ 2 ~ 1 — > e~ u ' I 2 as p^ oo, we see that 

/(y) oc (1 - u 2 /^-^ 2 " 1 exp - ly^ + y ) 

-» exp|-^(y - </>) T (y - 0)| 

as p — > oo, where P = kP^ UpPhP^ v v . Hence y —> Nh(4>, Ih) as required. □ 

Note that if B p = 0, then the result reduces to the result for the von 
Mises-Fisher distribution described in Section 4.2. If v p = 0, then the result 
reduces to Chikuse's [6] result of Theorem 3.4. 

Consider the probability measure nw,p,u,n,T, on C of a Borel set D 

Vw, p ,v,k,t,( d ) = ^F,p,v,K,s{{xp ■ Qp(x p , •) € D}); 

then — ► /i^s weakly as p — > oo, using the same argument as in 
the proof of Proposition 3.5. The limiting measures in particular cases are 
summarized in Table 1. 

4.5. Complex Bingham distribution. The complex unit sphere is written 
C5 ,p_1 (l) and we consider C5 lp_1 (l) = S 2p ~ (1). As p — > oo the uniform 
measure on CS°°(1) induces a Wiener process on C. In this case we write 
W c for the Wiener process using complex notation. If Z is complex white 
noise which induces this Wiener process W c on C, then we write Z ~ W c . 

The complex Bingham family of distributions is the complex analogue of 
the real Bingham distribution [17]. The complex Bingham distributions are 
particularly useful in shape analysis of landmarks in two dimensions (e.g., 
see [9]), where the distribution is used for rotation-invariant shape modeling 
because the density has the property that f(z) = f{e ld z) for any rotation 
9. The complex Bingham distribution is actually a special case of the real 
Bingham distribution [17]. 

The high-dimensional results for the complex Bingham proceed in an 
analogous way to the real Bingham case, with inner product replaced by 
(z,w) = z*w, where z* = z T is the transpose of the complex conjugate. Pos- 
itive (semi-) definite symmetric matrices are replaced by positive (semi-) 
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definite Hermitian matrices, and positive (semi-) definite self-adjoint linear 
operators are replaced with positive (semi-) definite Hermitian linear oper- 
ators. The complex Bingham (pB p ) family of distributions on CS' P ~ 1 (1) has 
probability measure 

d/J>B,p,x = ccb{pB p )~ 1 exp(pz*B p z p ) d^L c Sp , 

where z p £ CS' P ~ 1 (1), [i c Sp is the uniform probability measure on CS P (1), 
pB p is Hermitian and 

v 

ccb{pB p ) = 2tt p b j ex P T j> bj 1 = Y[( T j ~ T i), 

in the case when the real eigenvalues r, of pB p are all distinct. 

Proposition 4.2. Let z p have a complex Bingham (pB p ) distribution. 
Consider the sequence of Hermitian positive-definite matrices S p = (I p — 
B p )~ x , p = 1,2, . . . ,oo, which satisfy (3)-(5) and let P^ = [71, . . . ,7/1], where 
7h are complex eigenvectors o/S p . By direct analogy with Theorem 3.4 we 
have 

p^P^-^PhP^Zp Z £N h (0, I h ) as p - 00. 

We can use the complex normal approximation to the high-dimensional 
complex Bingham distribution and carry out inference in an analogous way 
to the procedure for the real Bingham distribution in Section 3.3. Weak 
convergence of the complex Bingham measure to a Gaussian measure as 
p — > 00 follows directly from Proposition 3.5, as the complex Bingham is a 
special case of the real Bingham. 

4.6. Complex Watson. The complex Watson distribution is a special case 
of the complex Bingham distribution with S" 1 = I p — Kfi/i* (see [20]). The 
distribution is useful in planar shape analysis as an isotropic distribution 
about the modal shape \i. As the form of the density is particularly simple 
in this case, we shall compare the high-dimensional complex Watson distri- 
bution with the complex normal approximation for various p. Consider a 
particular form of the complex Watson density given by 

fcw(zp) = ccw~ 1 {k) ex.p{-pz*(I p - Kfj,ff)z p }, 

where 

Ccw (k) = 27T p 1 F 1 (l;p-K P )e-P/(p-l)l. 

Now, as p — > 00, iFi(l;p; np) — ► (1 — k) _1 , and so using Stirling's approxi- 
mation, 
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Table 2 

Values of \og{ccw / cn {k)) for different p,K 



p 


0.02 


0.2 


0.4 


0.6 


0.8 


0.9 


0.98 


0.998 


2 


0.04148 


0.05783 


0.12564 


0.29834 


0.74630 


1.31239 


2.81813 


5.09713 


5 


0.01671 


0.02567 


0.06778 


0.19128 


0.56143 


1.07649 


2.53515 


4.80278 


10 


0.00837 


0.01354 


0.04005 


0.12750 


0.42875 


0.89228 


2.29969 


4.55444 


20 


0.00419 


0.00700 


0.02247 


0.07944 


0.30906 


0.71003 


2.04892 


4.28558 


50 


0.00167 


0.00287 


0.00982 


0.03853 


0.18134 


0.48686 


1.70299 


3.90438 


100 


0.00084 


0.00145 


0.00508 


0.02098 


0.11193 


0.34247 


1.43873 


3.60139 


1000 


0.00008 


0.00015 


0.00053 


0.00231 


0.01526 


0.06727 


0.64364 


2.55192 


10000 


0.00001 


0.00001 


0.00005 


0.00023 


0.00160 


0.00792 


0.16451 


1.52600 


100000 


0.00000 


0.00000 


0.00001 


0.00002 


0.00016 


0.00081 


0.02268 


0.66978 



Since there is a constraint ||z p || = 1, we take the singular complex normal 
approximation in 2p — 1 real dimensions of variability. We can write the 
density as 

f N (z) = cn' 1 ^) exp{-pz*{I p - Kfifi*)z p }, 

where 

C N (K)=V27T P ' 1 / 2 \^ p / P \ g , 

where \T, p /p\ g is the determinant in the 2p—l real dimensions of variability 
given by \T, p /p\ g =p~ p ~ 1 / 2 /(l — k). Hence, 

ccw(k) =c n (k)(1 + 0{p~ 1 )). 

In Table 2 we see some numerical comparisons of \og(ccw ( K ) I cn («)) for 
different p, k. Note that the approximation is better when k is small. For 
very high concentrations close to 1 a very large value of p is required for a 
good approximation. 

5. Practical application: brain shape modeling. Shape is the geometrical 
information that remains when translation, rotation and scale effects are 
removed [16]. We consider an application where the shape of the cortical 
surface of the brain is of interest. The data form part of a larger study with 
collaborators Bert Park, Antonio Gattone, Stuart Leask and Sean Flynn 
that will be reported elsewhere. 

A sample of n = 74 MR images of adult brains is taken. The brains are 
preregistered into a standard frame of reference (Talairach space) and so 
location and rotation are regarded as fixed — see Figure 1 for an example. 

We actually restrict the analysis to the p = 62,501 points on the cortical 
surface along a hemisphere of rays which radiate from the origin at a central 
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Fig. 1. An example brain showing the points on the surface. In the analysis we restrict 
ourselves to the upper hemisphere of the cortex only (above the origin landmark) and 
consider p = 62,501 points. 



landmark (midway between the anterior and posterior commissures). The 
measurements taken for the ith brain (i = 1, . . . ,n) are {r p i(t) :t = 1, . . . 
which are the lengths of the rays measured at the locations {6(t) :t= 1, . . . ,p} 
on the upper hemisphere, that is, 9(t) G Since {6(t) :t = 1, . . . ,p} are 

fixed and equal for all the brains, our data for the ith brain are solely 
the ray lengths, which we write as the p-vector r p i = (r p j(l), . . . , r p i{p)) T , 
i = 1, . . . ,n. We remove the scale information by taking x p { — ^p^/ll^pill? so 
that = 1 for i = 1, . . . ,n. Since the location and rotation are treated 

as fixed, this application involves statistical analysis on a high-dimensional 
sphere rather than in shape space itself. 
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We wish to obtain an estimate of the modal cortical shape and the prin- 
cipal components of shape variability for the dataset. We initially consider 
a model for the data as the high-dimensional Bingham distribution, and use 
the multivariate normal approximations from (14). We consider maximum 
likelihood estimation as in Section 3.3, and the parameters of the model are 
given by X p estimated by S p = £ Yh=i x pi x pi = pS, say- 
First we need to be able to compute the spectral decomposition in high- 
dimensional spaces. In the case where we have n <C p, the eigenvalues and 
eigenvectors can be computed using a straightforward procedure. Let us 
write X = [x p i, . . . , x pn ] for the n columns of vectors from a random sample. 
Now, using the spectral decomposition we have 

1 n 

3=1 

Consider the n x n matrix A = ^X T X, and the spectral decomposition is 
A = Sj=i ^jQjlJ \ which can be computed in 0(n 3 ) steps. Now 

1 ™ 

s 2 = - 2 xx T xx T = ^2^J 

3=1 

1 n A 

= l -XAX T = Y J - 1 (X qj )(X q3 ) T . 

3=1 

Hence, by equating coefficients, 

Tj = x <lj/\\ x <lj\\, &j = WXqjWVdj/n, j = l,...,n. 

Thus calculating the PCs is practical for huge p> n. Practical statistical 
analysis is carried out by choosing a low number of PCs which hopefully 
summarize a large percentage of variability, and then carrying out multi- 
variate tests in the reduced space. 

So, returning to the cortical brain surface example, we stack the p ra- 
dial lengths into vectors of length p = 62,501, and since we are not inter- 
ested in size we divide through by the norm of each stacked vector, to give 
Xpi = rpi/\\r p i\\ S »S p-1 (l), i = 1, ... ,n. We then obtain the spectral decom- 
position of S = T'p/p. The data are extremely concentrated, with a very high 
contribution from the first eigenvector = 0.99885). 

~ 1 /2 /v a 1/2 ^ 

We display a)/ 71 ± 3a> 2 72 m Figure 2, which shows the mode cor- 
tical surface shape ± 3 standard deviations along the first PC, for each 
of three orthogonal views. Note that this PC appears to show variability 
in the location of the origin landmark relative to the surface. This PC ex- 
plains 100u)2/ Y^i=2 &i = 26.9% of the variability about the mode. We display 

~ 1 /2 ~ a 1/2 /s 

w 1 71 ± 3cj 3 73 in Figure 3, which shows the mode cortical surface shape 
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Fig. 2. Plots of the modal cortical shape ± 5 standard deviations along PCI: (a) Sagittal 

1/2 1/2 1/2 

wew. Lighter gray: lj^ yi; darker gray: lu^ 71 +ti 2 72. (b) Sagittal view. Lighter gray: 
lu^ 2 ji: darker gray: d\J 2 ^\ ~lu^ 2 J2- (c) Axial view. Lighter gray: u>]/ 3 yi; darker gray: 

1/2 1/2 1/2 1/2 1/2 

(£>]/ 71 + a) 2 72. (d) Axial view. Lighter gray: u)^ 71; darker gray: lj^ 71 — ui 2 72- (e) 

1/2 1/2 1/2 

Coronal view. Lighter gray: lu 1 71; darker gray: <£>]/ ji+lo 2 72- (f) Coronal view. Lighter 

A 1 / 2 A 1 / 2 A A 1 / 2 A 

71; darker gray: 71 —ui 2 72- Additional shading has been added so that the 



gray: to 

higher the distance above the horizontal base (the line joining the anterior and posterior 
commissures) the lighter the shade of gray. 



± 3 standard deviations along the second PC, for each of three orthogo- 
nal views. Note that this PC is largely picking up "taller" "thinner" brains 
versus "shorter" "fatter" brains. This PC explains lOOc^/EiU^ = 12,8% 
of the variability about the mode. Note that the modal shape can only be 
identified up to a reflection, but in this case the correct choice is obvious. 

It could be argued that the Fisher-Bingham is a more appropriate model 
here given that we have the reflection information in our data. In this case 
the high-dimensional approximation is the multivariate normal distribution 
with nonzero mean. The estimated parameters of the approximating model 

are the sample mean and sample covariance matrix, and for this example 

1/2 

the sample mean and tD 1 71 are indistinguishable up to machine accuracy, 
and so the conclusions are identical. 




6. Discussion. The noise models considered in the paper should have fur- 
ther applications in addition to those in high-dimensional directional data 
analysis and shape analysis. For example, the work could be used to model 
noise in (high-dimensional) images where the parameters of the noise pro- 
cess would depend on the particular imaging modality and the object (s) in 
the image. The models could be suitable for nonstationary and long-range 
correlation noise. There is a large literature on stochastic models in image 
analysis, and particularly successful models include Markov random field 
models (e.g., [3, 12]) and intrinsic random fields [4]. Our models have far 
more parameters in general, and so their use as image noise models would 
be restricted to situations where there is a reasonable amount of training 
data (or strong prior knowledge) available. 

In the brain application the points on the cortical surface provide a rough 
correspondence of parts. An improved analysis would be to locate points at 
more accurate points of biological homology, and then the mean shape and 
principal components would give more accurate estimates of the population 
properties of the cortical surfaces. Such a task is far from straightforward. 
However, our approach does give an approximate assessment of the main 
global features of brain shape and variability. 
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We have considered the size of an object x p to be \\x p \\, but other choices 
are possible which would change the practical analysis. For example, with 
the brain application one might fit a smooth surface x to a brain using 
a finite series of orthogonal functions and then take the size as ||x||. Two 
brains which look to be quite similar in size with similar \\x\\ values could 
have rather different \\x p \\ values if one is a much rougher surface than the 
other. 

For inference we discussed the cases p/n — ► oo and n/p — ► oo in Section 3.3. 
The asymptotic regime n/p — > 7 fixed as n — > 00, p — > 00 is of great interest 
in many disciplines, including mathematical physics — see [15]. In particular, 
Johnstone [15] describes developments based on the Tracy-Widom distribu- 
tion for the largest eigenvalue, and associated work. 

As mentioned in the Introduction, the analysis of functions is somewhat 
different from our situation due to the smoothness assumptions that are 
usually made in FDA. The models for the induced paths in C are of more 
relevance to FDA, where the functions are of a standard type and continuity 
is present. 

It is of interest to extend the work to other manifolds, in particular the 
Stiefel manifold of orthonormal frames and the Grassmann manifold (which 
is appropriate for affine shape). Watson [27] provides some asymptotic high- 
dimensional results, and in particular, p 1 ! 2 multiplied by the first h rows of 
a uniformly distributed matrix X on the Stiefel manifold V m;P tend to an 
/im-dimensional zero-mean Gaussian distribution with identity covariance 
matrix as p — > 00. Chikuse [6, 7] provides many extensions. However, the 
study of probability distributions in the continuous limit as h — > 00 requires 
further developments. 

Acknowledgments. The author would like to thank two anonymous re- 
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